Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals

Copy number variants (CNV) are established risk factors for neurodevelopmental disorders with seizures or epilepsy. With the hypothesis that seizure disorders share genetic risk factors, we pooled CNV data from 10,590 individuals with seizure disorders, 16,109 individuals with clinically validated epilepsy, and 492,324 population controls and identified 25 genome-wide significant loci, 22 of which are novel for seizure disorders, such as deletions at 1p36.33, 1q44, 2p21-p16.3, 3q29, 8p23.3-p23.2, 9p24.3, 10q26.3, 15q11.2, 15q12-q13.1, 16p12.2, 17q21.31, duplications at 2q13, 9q34.3, 16p13.3, 17q12, 19p13.3, 20q13.33, and reciprocal CNVs at 16p11.2, and 22q11.21. Using genetic data from additional 248,751 individuals with 23 neuropsychiatric phenotypes, we explored the pleiotropy of these 25 loci. Finally, in a subset of individuals with epilepsy and detailed clinical data available, we performed phenome-wide association analyses between individual CNVs and clinical annotations categorized through the Human Phenotype Ontology (HPO). For six CNVs, we identified 19 significant associations with specific HPO terms and generated, for all CNVs, phenotype signatures across 17 clinical categories relevant for epileptologists. This is the most comprehensive investigation of CNVs in epilepsy and related seizure disorders, with potential implications for clinical practice.

An epileptic seizure is a paroxysm of symptoms and signs due to abnormally excessive or synchronous neuronal activity 1 . Seizures are classified based on their characteristics and electroencephalogram (EEG) as focal-onset seizures (which start in a specific brain region) and generalized-onset seizures (which are rapidly seen across bihemispheric networks) 1,2 . The utility of this seizure classification is that it categorizes epilepsy into syndromes and allows clinicians to make implications about disease etiology, trajectory, and response to medication. Clinical manifestations vary from whole-body convulsions with loss of consciousness (tonic-clonic seizures), to movements involving only part of the body with variable levels of consciousness (focal motor seizure), to a brief loss of awareness (absence seizure) 1,2 . Seizures can be provoked by head trauma, infection, or acute toxic-metabolic imbalance, or they can be spontaneous and unprovoked. Individuals who exhibit at least one unprovoked seizure with an enduring elevated risk of further seizures or who have the electroclinical features of one of a few specific epilepsy syndromes that can be diagnosed without recurrent seizures fulfill the criteria for a diagnosis of epilepsy 1 .
Seizures and epilepsy are common in the general population. Neonatal seizures occur in 1.5% of neonates, febrile seizures in 2-4% of young children, and epilepsy in up to 1% of children and adolescents 3 . Seizures are common among individuals with neurodevelopmental disorders, affecting 21.5% of those with autism and intellectual disability and 8% with autism without intellectual disability 4 .
Copy number variants (CNVs), such as deletions and duplications, change the dosage of genomic segments and are established risk factors for various types of epilepsy [5][6][7][8][9][10][11][12][13][14] , seizures 15 , and neuropsychiatric disorders [16][17][18][19] . Large CNVs can affect multiple dosage-sensitive genes, leading to complex clinical presentations. To date, only one hypothesis-free genome-wide CNV association study (CNV-GWAS) has been reported for epilepsy 20 . This CNV-GWAS in 10,712 individuals with epilepsy and 6,746 controls identified three genome-wide significant CNVs 20 . High-resolution CNV screening has become routine in clinical molecular diagnostics, leading to greater detection of chromosomal abnormalities in patients 21 . Diagnostic CNVs can be identified in 1-4% of individuals with epilepsy and >10% of those with seizures and neurodevelopmental disorders 13,[20][21][22] . However, the pleiotropy of pathogenic CNVs, partially driven by structural properties (size, fixed vs. variable breakpoints, number of affected genes), represents a significant challenge in the clinical interpretation of CNVs, limiting their utility for disorder classification, prognostication, and the development of precision medicine treatments that specifically target the critical pathogenic gene(s) altered by the CNV. The majority of pathogenic and likely pathogenic CNVs are greater than 1 megabase (Mb) in size, and it is often unclear which gene(s) or genomic element(s) affected by the CNV contribute to one or more disorders 23,24 . A well-powered seizure CNV discovery screen combined with detailed genotype-phenotype analyses could identify genomic segments that confer risk for seizures, identify clinical characteristics in affected patients and consequently guide genetic test interpretation.
Although many individuals with neuropsychiatric and developmental disorders have comorbid seizures, genome-wide CNV association analyses across epilepsy and seizure have yet to be reported. We hypothesized that genetic risk for seizures is shared in individuals with epilepsy diagnosed according to International League Against Epilepsy (ILAE) criteria 1 and related neurological and neurodevelopmental disorders who also have seizures. Therefore, a joint analysis could add to the three epilepsy-associated CNV loci reported previously 20 . To explore this hypothesis, we performed a meta-analysis of GWAS studies comprising 26,699 individuals with diagnosed epilepsy or seizures and 492,324 controls. Since both definitions are based on the presence of seizures, we refer to individuals affected by either condition as individuals with seizures from here on forward. The effective sample size of this study (N eff = 101,302) provides adequate power to identify significant associations of risk CNVs that are present in the general healthy population, therefore, do not exhibit complete penetrance. However, the analytic setup restricts the frequency in the general population to up to 1% for quality purposes. We assessed the pleiotropy of any identified seizure-associated CNV in subsequent meta-analyses of epilepsy and 238,161 independent individuals affected by a range of 23 neuropsychiatric disorders. Finally, using a subset of the seizure cohort comprising 10,880 individuals with epilepsy detailed using 214,203 Human Phenotype Ontology (HPO) annotations 25 , we evaluated the clinical features characterizing carriers of each seizure-associated CNV.

Discovery of 25 genome-wide significant seizure-associated CNVs regions
We performed a meta-analysis of 16,109 individuals with epilepsy and 8545 population controls (the Epi25 Collaborative cohort) with 10,590 individuals with seizures (not explicitly meeting diagnostic criteria for epilepsy) and 483,779 population controls, derived from an aggregated CNV dataset of 17 cohorts (neuropsychiatric disorders cohort) (see all cohorts of this study in Supplementary Table 1). The genome was scanned using 267,237 genomic segments of 200 kb size in a 10 kb sliding window approach 26 . After applying Bonferroni correction of the threshold for a significant association in the meta-analysis and finemapping, we identified 25 loci associated with seizures at genomewide significance (P ≤ 3.74 × 10 −6 ). All 25 loci are shown in Fig. 1 and detailed in Table 1. The 25 identified loci included 15 deletion CNVs (size range: 230 kb to 5 Mb) and ten duplication CNVs (size range: 290 kb to 8.9 Mb). All the genome-wide associated deletions found in this study consisted of the loss of one copy, while all duplications consisted of the gain of one copy. Three of the 25 seizure-associated loci (15q11.2-q13.3 dup, 15q13.2-q13.3 del, 16p13.11 del) had previous genome-wide statistical support for an association with epilepsy from our previous study 20 that included 40% of the individuals with seizures of this study. All other identified CNVs (22/25, 88%) represent new genome-wide significant loci for seizures, with 10/22 (59%) loci previously implicated in neurological and psychiatric disorders, 6/22 (23%) specifically in epilepsy by studies without genome-wide statistical support, 2/22 (9%) reported in individuals without neurological or psychiatric disorders, and 4/22 (18%) not previously reported regions. We detailed in Table 2 all commonly reported disease phenotypes for the 25 identified seizure-associated loci. Our meta-analysis in seizure disorders was likely not powered enough to identify some of the known CNVs implicated in epilepsy (without genome-wide statistical support) associated with seizures (e.g., 1q21.1 del/dup). Reciprocal CNVs, defined by deletions and duplications associated with seizures involving overlapping genomic segments, were found at 15q11.2, 16p11.2, and 22q11.21. No overlap existed between the seizureassociated CNV regions identified in this study and the most recent SNP-based GWAS study in epilepsy 27 .

Fine-mapping and candidate genes
Out of the three CNV regions with previous genome-wide statistical support, our fine-mapping approach narrowed down the critical seizure-relevant region for the known 15q11-q13 duplication to the imprinted promoter/exon 1 region of SNPRN (Table 2, Supplementary  Fig. 1). The SNRPN promoter/exon 1 region was suggested to regulate the imprinting of the critical region for Prader-Willi syndrome 28,29 . Overexpression of SNRPN, corresponding to the seizure-associated duplication of the region, was found to cause abnormal neural development in cultured primary cortical neurons 30 . Conversely, SNRPN knockdown was found in the same study to also cause subtle neuronal abnormalities, in line with reports of short SNRPN deletions in Prader-Willi syndrome 31 . For the other two CNV regions with previous genomewide statistical support, we identified several genes with a brain phenotype in the minimal credible intervals. The 15q13.2-q13.3 deletion credible interval includes the haploinsufficient gene OTUD7A, shown to cause abnormal development of cortical dendritic spines and dendrite outgrowth in Otud7a DEL/+ mice 32 , and KLF13, shown to cause a layerspecific decrease of cortical interneurons in Klf13 DEL/+ mice 33 . The 16p13.11 deletion credible interval includes two haploinsufficient genes: MYH11, implicated in cerebrovascular disorders 34,35 that are a risk factor for seizures 36 , and MARF1, involved in cortical neurogenesis 37 .
Out of the six seizure-associated CNV regions previously implicated in epilepsy without genome-wide statistical support, we mapped the credible intervals of the two seizure-associated deletions at 1p36 to the first and third known critical regions for seizures within the phenotype spectrum of the 1p36 deletion syndrome 38 . Known disease genes in the credible intervals at 1p36 are DVL1 (Robinow syndrome 39 ), TMEM240 (Spinocerebellar ataxia 21 40 ), and SKI (Shprintzen-Goldberg  45 . Among the ten seizure-associated CNV regions previously reported in other neurological and psychiatric disorders, we identified one credible interval suggesting a different causal gene than previously reported: an interstitial 9q34.3 duplication not encompassing EHMT1 that is considered as the causal gene based on one out of 22 reported 9q34.3 duplication carrier 46 . The top candidate gene within the credible interval identified by our meta-analysis is GRIN1, affected by 9q34.3 duplications in 21 of all reported carriers 46 . GRIN1 gain of   Highlighted are: (1) Darkest grey: three CNV regions with previous genome-wide statistical support for epilepsy (PMID: 32568404), (2) Medium-dark grey: six CNV regions previously implicated in epilepsy without genome-wide statistical support, (3) Medium-light grey: ten CNV regions previously reported in other neurological and psychiatric disorders, and (4) Light grey: four novel CNV regions never reported in neurological or psychiatric disorders. In the second column, DEL and DUP indicate deletions and duplications, respectively. Gene names are formatted in italic.
function variants are known to cause a developmental epileptic encephalopathy, often with polymicrogyria 47 . In contrast, our finemapping analysis confirms TBX1 as the (known) causal gene for the 22q11.21 deletion/DiGeorge syndrome 48 . We also found LZTR1 (Noonan syndrome 49 ) within the credible 22q11.21 deletion intervals. Other known disease genes in the credible intervals of the remaining CNV regions implicated in neurological and psychiatric disorders were: NPHP1 inside a 2q13 duplication (Autism and global developmental delay 50,51 ), KANK1 (Cerebral palsy spastic quadriplegic 2 52 ) inside a small 9p24.3 DOCK8/KANK1 deletion, and NIPA1 (Autosomal dominant spastic paraplegia 6 53 ) inside the 15q11.2 BP1-BP2 deletion syndrome region. Finally, we identified four novel CNV regions associated with seizures. Three out of four harbored known disease genes. The credible region of a non-canonical 16p13.3 duplication included STUB1. STUB1 gain of function was reported to cause early onset dementia syndrome 54 and autosomal dominant ataxia with cognitive decline and autism 55 . The credible region of a non-canonical 17q21.31 deletion included BRCA1. BRCA1 mutations are well-known in cancer 56 , with BRCA1 as a possible mediator of glioma cell proliferation, migration, and glioma stem cell self-renewal 57 . The credible region of a novel 20q13.33 duplication included KCNQ2 and EEF1A2. KCNQ2 gain of function is known to cause neurodevelopmental disability and neonatal encephalopathy 58,59 . EEF1A2 gain of function was shown to cause neurodevelopmental disorders, including epilepsy and intellectual disability 60 .
Significantly enriched Gene ontology (GO) Biological Processes among all known brain-related disease genes in the credible intervals were: chordate embryonic development ( . No GO Biological Process was significantly enriched when considering all genes inside all credible intervals, pointing to likely heterogeneous disease mechanisms of the 25 seizure-associated CNV regions. All credible intervals and known brain-related disease genes are detailed in Table 2, additional candidate genes of lower confidence are detailed in Supplementary Data 1, and all genes inside the credible intervals are detailed in Supplementary Data 2.

Most of the 25 identified risk CNVs are pleiotropic
We performed 23 meta-analyses of epilepsy with 23 other neuropsychiatric disorders (listed in Supplementary Table 2) in an additional 238,161 individuals with neuropsychiatric disorders and 492,324 controls to explore pleiotropy of the 25 identified CNVs. 24 out of 25 seizure-associated CNVs were significantly associated in at least one of the 23 meta-analyses with a neuropsychiatric disorder. The number of neuropsychiatric disorders with which a significant association was found and their greatest odds ratios are reported in Table 1. About two thirds (60%) of all CNVs were highly pleiotropic and showed significant associations with >10 epilepsy/neuropsychiatric disorder metaanalyses. The most frequently co-associated phenotype was "Neurodevelopmental abnormality" (HP:0012759 [https://hpo.jax.org/ app/browse/term/HP:0012759]; associated with 36% of all seizureassociated CNVs).

Characterization of the clinical subphenotypes enriched in the carriers of each seizure-associated CNV in epilepsy patients with deep phenotypes
We performed phenome-wide association analyses for each of the 33 credible intervals identified across the 25 CNV regions to characterize the high-resolution clinical manifestations associated with each CNV. This analysis was performed on a subset of the Epi25 Collaborative cohort (Phenomic cohort, Supplementary Table 1) comprising 10,880 individuals with non-acquired epilepsy and deep phenotypic data (the clinical presentation of this cohort of 10,880 individuals and the frequencies of selected common and characteristic epilepsy phenotypes are provided in Supplementary Table 3). In the Phenomic cohort, 562 individuals (5.2%) carried at least one seizure-associated credible interval (N = 498 / 4.6% carried one credible interval, N = 64 / 0.6% carried 2-5 credible intervals). The most common credible interval (deletion at 2p21-p16.3) was carried by 114 (1.0%) individuals, and 18 credible intervals were found in at least 0.1% of the cohort (≥11 carriers). One CNV was not found (deletion at 9p24.3, containing a single credible interval). Across the 32 detected credible intervals and 1667 annotated HPO concepts, we identified 622 nominally significant associations (two-sided Fisher's exact test, Supplementary Data 3). Given the large number of associations tested and that HPO annotations describing the same clinical feature at different levels of precision are highly correlated, we applied the minP step-down procedure to aid interpretation 61 , yielding 19 associations robust to multiple testing within each genetically defined group (minP-adjusted P < 0.05, Table 3 We interrogated the phenotypic annotations of CNV carriers regarding the candidate genes prioritized in our fine-mapping analysis. MSH2 was prioritized as the candidate gene for the most common deletion in the Phenomic cohort (2p21-p16.3). Heterozygous loss of function variants of the haploinsufficient gene MSH2 cause Lynch syndrome 1 64 , and complete knockout of paralog Msh2 in Ccm1 +/mice causes multiple cavernoma through a presumed second hit 65 . We found that carriers had a nonsignificant greater frequency of neoplasms (OR = 2.35, unadjusted P = 2.49 × 10 −2 , minP-adjusted P = 1.00) and cerebral cavernomata (OR = 5.23, unadjusted P = 6.58 × 10 −4 , minPadjusted P = 0.157) than non-carriers. Carriers of the 1p36. 33 49 . However, none of these six individuals had annotations beyond seizures and electroencephalography phenotypes that would support a multisystemic syndrome.
Finally, clinicians may want to know the frequency of broad clinical features among carriers of the CNV identified in their patients to improve the interpretation of its clinical relevance and to facilitate genetically stratified prognostication. Therefore, we prioritized 17 common, conceptually broad, and important epilepsy manifestations and comorbidities for visualization, including the co-occurrence of In the first column, the genomic band and coordinates of the considered CNV are reported. The CNV type is reported in column 2. In column 3, the HPO term name and identifier are reported. In column 4, the odds ratio with unadjusted two-sided 95% confidence interval is reported. In column 5, the relative risk is given to aid interpretation. In column 6, the unadjusted two-sided P-values from Fisher's exact test are reported. In column 7, the minP step-down P-value is given, which provides an adjustment for all 1,667 HPO term associations tested within each CNV group, while accounting for the correlation between harmonized HPO annotations (see Online Methods). In column 8, the proportion of CNV carriers annotated with the phenotype is given. In columns 9-10 and 11-12, N pheno and N tot are the number of individuals annotated with the phenotype and the total number of individuals carrying and not-carrying the CNV, respectively.
generalized-onset and focal-onset seizures that characterizes the combined generalized and focal epilepsy type 62 ( Fig. 3

Discussion
In this study, we leveraged a substantial increase in sample size to identify novel seizure-associated CNVs when jointly analyzing 26,699 individuals with various types of seizure disorders against 492,324 population controls. We identified 25 novel loci with genome-wide significance for seizure disorders. In addition, all three previously reported epilepsy-associated loci at genome-wide level maintained genome-wide significance for seizure disorders in our meta-analysis that included the epilepsy cohort from the previous study 20 . Of the 25 seizure-associated loci, 16 were previously implicated in neurological and psychiatric disorders, including epilepsy. Five were flanked by known segmental duplications (SDs) or low copy number repeats (LCRs). Of note, our fine-mapping analysis confirmed the first and third known critical regions for seizures within the phenotype spectrum of the 1p36 deletion syndrome 38 , TBX1 as the (known) causal gene for the 22q11.21 deletion/DiGeorge syndrome 48 , and suggested the SNRPN promoter/exon 1 region as the causal element for seizures within the larger BP2-BP3 15q11.2-q13 duplication region. However, our study design did not support the assessment of whether the imprinting status of the duplicated region itself plays an additional role besides the previously suggested role of SNRPN promoter/exon 1 region in regulating the imprinting of the Prader-Willi critical region. Future studies that also include genomic screens of parents will shed light on this open question. In a high-resolution phenomic analysis in a subset of 10,880 individuals from our cohort with epilepsy (from the Epi25 cohort), we identified 622 suggestive and 19 significant clinical associations informative for epileptologists among CNV carriers. This observation indicates that beyond contributing to the generic risk of seizures, several CNVs contribute to specific epilepsy types. Carriers of some CNVs tended to have features typical of developmental and epileptic encephalopathies with neurodevelopmental and non-seizure phenotypes. Conversely, carriers of others had phenotypes restricted to the core epileptic features of seizures and electroencephalographic abnormalities (both generalized and focal). Interestingly, reciprocal CNVs involving 22q11.21 seemed to produce opposite epilepsy types, with deletion and duplication carriers tending to have generalized and focal epilepsies, respectively. Dose-dependent effects of KLHL22 on DEPDC5 degradation are a possible explanation 68 . Overall, the high degree of pleiotropy among seizure-associated CNVs implies that these CNVs likely impair neurodevelopmental processes rather generically and contribute to the broad spectrum of neurodevelopmental disorders. According to the oligo-/polygenic inheritance model, CNVs may interact with the genetic background or environmental factors to generate the final disease phenotype. Interaction between CNVs and the polygenic background was recently demonstrated in carriers of the schizophrenia-associated 22q11.2 deletion 69 . Support for an oligogenic-CNV disorder model was also recently published 70 .
Genome-wide genetic screening for pathogenic CNVs is recommended as a first-tier approach for the postnatal evaluation of individuals with intellectual disability, developmental delay, autism spectrum disorder, multiple congenital anomalies, and prenatal evaluation of fetuses with structural anomalies observed by ultrasound [71][72][73] . It has previously been shown that CNVs confer significant risk towards epilepsy 1,2,4-8,10,13,74 , particularly for individuals with comorbid neurodevelopmental disorders such as intellectual disability 21,[74][75][76] . In contrast to single nucleotide polymorphism SNP GWASs for epilepsy or seizures, where the risk of identified variants is small (OR < 2) 77,78 , the effect sizes of the 25 CNVs identified in this study are large (median OR = 11, range 2-53). Our high-resolution phenomic analysis of 10,880 individuals with epilepsy grouped by CNV carrier status illustrates the seizures, EEG and brain imaging findings, and neurodevelopmental and other co-morbidities associated with each CNV. This genotype-first approach complements the traditional singlephenotype, case-control paradigm by taking a simultaneous phenomewide perspective in individuals deeply phenotyped according to standardized protocols before CNV discovery or genetic association tests. We found phenotypic evidence supporting associations between CNVs, broad markers of epilepsy types, and fine-grained phenotypes. The high-resolution phenotype associations that an epileptologist can recognize derived from the HPO phenotype association analysis and    Fig. 3 | Summary clinical signatures of CNVs in a deeply phenotyped epilepsy cohort. The percentage of carriers of the CNV with each broad phenotype is shown by the height of bars arranged on a polar axis, with two-sided 95% confidence interval error bars for these percentages derived from the binomial distribution using stats::binom.test(). For reference, dots indicate the percentage of the entire Phenomic cohort of 10,880 people with each broad phenotype (representing the prior probability of a person having the phenotype without genetic stratification). The binomial distribution two-sided 95% confidence intervals for a cohort size of 10,880 are no wider than 1.9% (not shown for clarity). "Craniofacial or skeletal dysmorphism" includes individuals with either "Abnormality of the head [HP: 0000234]" (which excludes isolated brain structural abnormalities) or "Abnormal skeletal morphology [HP:0011842]". "Motor, movement or muscular disorder" includes individuals with any of "Abnormal central motor function [HP:0011442]", "Abnormality of movement [HP:0100022]" or "Abnormality of the musculature [HP:0003011]", but not "Motor delay [HP:0001270]", which is included in "Neurodevelopmental abnormality". While "Neurodevelopmental abnormality" includes those with "Intellectual disability", the latter is shown additionally as it is a neurodevelopmental outcome with particularly important socioeconomically important consequences. EEG electroencephalogram. Further CNV profiles are shown in Supplementary Fig. 2. disease risk estimates from the meta-analysis for each CNV can enhance the interpretation of clinical relevance and pathogenicity following the American College for Genetics and Genomics Copy Number variant interpretation guidelines 24 . Our study has several limitations. First, many of the patients with seizures included in this study have comorbid neurological and psychiatric disorders. Therefore, some of the identified CNV loci may be associated with other clinical phenotypes present in a high percentage of all cases. Second, we did not detect robust associations with two important outcomes in our HPO analysis, refractory drug response and sudden unexpected death. Sudden unexpected death in epilepsy is poorly suited to cross-sectional studies: it was annotated to only 4 of 10,880 individuals, far fewer cases than expected to occur with followup of this cohort of individuals requiring tertiary center care 79 . This emphasizes the open-world interpretation required for our results: in any study that is cross-sectional and of a disorder that has inherently variable phenotyping depth (epilepsy presentations can often be classified only incompletely) 1,62 , and which is characterized by some phenotypes that are age-dependent (such as some seizure types, autism, and intellectual disability), one should rarely assume that the absence of an annotation can be interpreted as the absence of that phenotype over the lifetime of the carrier. Thus, the proportion of individuals annotated with a phenotype is likely lower than the actual proportion manifesting it over their lifetimes 80 . Third, in contrast to conventional SNP-based GWASs, CNV-GWASs have major challenges in identifying the causal gene(s) impacted by the CNV. Among the 25 identified CNVs, deletions ranged from 230 kb to 5 Mb and duplications from 290 kb to 9 Mb, affecting 14.2 genes on average. CNV breakpoints in the current study are estimated from genotyped SNPs around the actual breakpoint. These breakpoint estimates are limited by the resolution of the genotyping platform used to call the CNVs. In fact, microarrays have many technical limitations, such as poor breakpoint resolution and limited sensitivity for small CNVs 81 . Newer technologies like whole-genome sequencing (WGS) will enable the assessment of a more comprehensive array of rare variants, including balanced rearrangements, small (exonic) CNVs 82 , short tandem repeats, and other structural variants 83 . However, some genomic regions harbor complex deletion/duplication/inversion rearrangements (e.g., 22q11.21 84 , 15q11.2 85 ) that can even show population stratification (e.g., 16p11.2 86 ). More accurate and complete (pangenome) references will be needed to determine the exact breakpoints of such complex rearrangements 87,88 , even in the case of sequencing-based CNVs discovery. Lastly, we performed joint epilepsy/seizures and cross-disorder meta-analyses in individuals with minimal clinical information. Future studies with access to rich clinical metadata, such as electronic health records, will likely identify additional seizureassociated CNVs. It is important to consider the inclusion criteria for this cohort and the definition of cases and controls when interpreting associations and their relevance to a patient. Our phenomic analysis cohort was performed using the years 1-3 data of the Epi25 Collaborative, predominantly recruited from academic epilepsy centers and of European ancestry (92.9%, see Online Methods). Additionally, we screened cases to exclude those with brain trauma, meningitis, or encephalitis. Thus, our clinical associations should be considered most valid in individuals of European ancestry with likely genetic or unexplained epilepsies attending specialist epilepsy centers. Future data analyses from subsequent years of Epi25 will provide data more applicable to other populations.
Large-scale collaborations that enable the aggregation of massive datasets have greatly advanced epilepsy and the discovery of genetic factors through GWASs. Here, we have extended this framework to CNV discovery by meta-analyzing epilepsy and seizure disorders, followed by additional meta-analyses in neuropsychiatric disorders and traits to explore pleiotropy. We also identified fine-grained genotypephenotype associations and clinical profiles for each CNV. Our results will help refine promising candidate CNVs associated with specific epilepsy types and extend their clinical value. We are confident that applying this framework to even larger datasets has the potential to advance the discovery of all clinically relevant risk loci, ultra-rare highrisk CNVs missed by this study, and the underlying genes or functional elements.

Study cohorts
Each center's ethics committees/institutional review boards approved data collection and use. For the Epi25 cohort, patients or their legal guardians provided signed informed consent/assent according to local IRB requirements; as samples had been collected over 20 years in some centers, forms reflected standards at the time of collection. For Epi25 Consortium samples collected after 25th January 2015, forms required specific language according to the NIH Genomic Data Sharing Policy.

Individuals with clinically defined epilepsy -Epi25 Collaborative
Individuals with ILAE-defined epilepsy (N = 16,109) were collected through the Epi25 Collaborative. The epilepsy diagnosis was performed according to clinical criteria (clinical interview, neurological examination, EEG, imaging data), following International League Against Epilepsy (ILAE) classifications 89 . All cohorts are detailed in Supplementary Table 1

CNV calling and quality control -Epi25 Collaborative
We restricted our analysis to only autosomal CNVs due to a higher quality of calls and followed the quality control (QC) pipeline developed in our previous study 20 . In detail, QC was performed in two major steps (1) pre-CNV calling QC and (2) post-CNV calling QC. For pre-CNV calling QC, we excluded samples with a call rate <0.96 or discordant sex status. To select individuals of European ancestry, we filtered autosomal SNPs for low genotyping rate (<0.98), a high difference in the SNP minor allele frequency between cases and controls (>0.05), deviation from Hardy-Weinberg equilibrium (HWE) with P ≤ 0.001), and pruned the remaining SNPs for linkage disequilibrium (-indeppairwise 200 100 0.2) using PLINK v1.9 91 . We then performed a principal component analysis (PCA) of the Epi25 cases and controls using PLINK v1.9 91 and GCTA 92 . European individuals were defined as individuals clustering with the 1000 Genomes Project 93 European samples. We created GC wave-adjusted LRR (Log-R ratio) intensity files for all samples using PennCNV, generated a custom population B-allele frequency file, and employed PennCNV's CNV calling algorithms 2,94 to detect CNVs in our dataset. The post-CNV calling QC included the following steps: (1) CNV calls of the same type (deletion or duplication) were merged if the number of SNP/intensity markers between them was <20% of the total number when both segments were combined; (2) CNVs supported by <20 markers, <20 kb long, and with a SNP density <0.0001 were excluded from subsequent analyses; (3) CNVs that overlapped other CNVs in ≥1% of all samples within the Epi25 dataset were excluded to remove potential platform-specific artifacts, (4) CNVs with >50% overlap with telomeric, centromeric, and immunoglobulin regions of the hg19 reference assembly were excluded; (5) CNVs with ≥50% overlap with reported common CNVs (allele frequency >1%) in two independent CNV reference catalogs (DGV Gold Standard Dataset 95 ; DECIPHER Population Copy-Number Variation Frequencies 96 ) were excluded. Finally, the probe-level intensity plots of all CNVs supporting the seizure-associated regions (Table 1) were visually inspected to exclude any remaining artifacts. The DGV Gold Standard and DECIPHER Population frequencies of the remaining CNVs are given in Supplementary Table 4.

Individuals with seizures or neuropsychiatric phenotypesneuropsychiatric disorders cohort
A large CNV dataset from individuals with a range of neuropsychiatric disorders (including seizure disorders) was aggregated from 17 different sources by Collins et al. 97 . The contributors of each cohort provided the specific clinical phenotypes. The aggregated individuals were grouped into 54 partially overlapping disease phenotypes standardized through the Human Phenome Ontology 98 . The 54 different phenotypes of Collins et al. 97 were obtained through a recursive hierarchical clustering that defined a minimal set of nonredundant primary phenotypes, each including a minimum of >300 samples in at least three independent cohorts, >3000 samples in total across all cohorts, and had less than 80% sample overlap with any other phenotype. Of the 54 phenotypes, we only selected neurological and psychiatric HPObased phenotypes (N = 23, excluding Seizures, Supplementary Table 2). The architecture of these HPO-based phenotypes allows the identification of associations at different levels, from broad to narrow phenotypes, providing the opportunity to distill between pleiotropic and specific associations. This data set also included the Epi25 cohort from our previous CNV GWAS study 20 . This previous (outdated) Epi25 cohort was excluded from the neuropsychiatric cohort for crossdisorder meta-analyses in the present work. All the considered cohorts are listed in Supplementary Table 1. This aggregated CNV dataset comprised 248,751 individuals affected by at least one of 24 neuropsychiatric disorders, including 10,590 individuals with seizures and 483,779 population controls.

Quality control -neuropsychiatric disorders cohort
The CNV harmonization procedure for the Neuropsychiatric cohort is described in the Supplementary Materials of Collins et al. 97 and included following steps: (1) CNV calls of the same type (deletion or duplication) were merged if their breakpoints were within ±25% of the size of their corresponding original CNV calls to avoid oversegmentation of large CNV calls; (2) CNVs not mapped to autosomes from the primary hg19 assembly were excluded; (3) Only CNVs between ≥100 kb and ≤20 Mb in size were considered; (4) CNVs that matched reported common CNVs (allele frequency >1%) in three independent CNV reference catalogs derived from genome sequencing (Abel et al. 99 ; Collins et al. 100 ; Sudmant et al. 81 ) were excluded; (5) CNVs that overlapped other CNVs in ≥1% of samples within the same dataset or in any of the other array CNV datasets were excluded to remove potential platform specific artifacts; (6) We excluded all CNVs with ≥30% overlap with somatic hypermutable sites, segmental duplications, simple/low-complexity/satellite repeats, or N-masked bases of the hg19 reference assembly.

Genome-wide association analysis
We performed segment-based CNV burden analyses to identify genomic regions with a significant increase of CNVs in epilepsy cases compared to controls, separated by CNV type (deletion or duplication). We adopted a sliding window approach as introduced by Collins et al. 26 . The sliding windows model allowed association testing of all autosomes through 267,237 sliding windows characterized by a window size of 200 kb and a step size of 10 kb, corresponding to 13,339.6 non-overlapping windows. Each of these windows was required to have a low overlap with hypermutable sites, segmental duplications, simple/ low-complexity/satellite repeats, and N-masked regions (>30%). For each of the genomic regions, we counted the number of overlapping CNVs separately for cases and controls for each CNV type (deletion or duplication). We required an overlap between the CNV and the genomic window of ≥10% to reveal the potential burden of small deletions or duplications (size ≥ 20 kb). We used the one-sided Fisher test as the test statistic for the CNVs collapsed for each segment. Cases/control CNV counts and the Fisher tests were performed using the CNV docker available at https://hub.docker.com/r/talkowski/rcnv and custom python (version 3.7.9) and R (version 3.6.1) scripts. The same procedure was applied to the cohorts of the neuropsychiatric disorder dataset, as detailed in Collins et al. 26 .

Meta-analysis and fine-mapping
Fixed-effects meta-analyses were performed using the metafor R (version 3.6.1) package with an empirical continuity correction 101 and a saddlepoint re-approximation of the null distribution used for inference. The meta-analysis procedure is detailed in Collins et al. 26 . We meta-analyzed the effect sizes from 7 GWAS derived from the 17 cohorts of the neuropsychiatric disorder dataset with each segmentbased P-value of the Epi25 dataset. The threshold for genome-wide significance was set to α = 3.74 × 10 −6 after Bonferroni correction for multiples testing corresponding to the number of independent, nonoverlapping 200 kb windows, calculated by merging all overlapping windows and dividing the sum of their sizes by 200 kb (effective N = 13,339.6 independent windows; P = 3.74 × 10 −6) ). To account for possible cohort-specific biases, we expected each segment to fulfill the following additional criteria: (1) at least two cohorts featuring nominal significant P-values (P < 0.05) for the given segment, and (2) a metaanalysis P < 0.05 after excluding the single most significant cohort. We then used a Bayesian algorithm 102 to identify the minimal credible interval(s) that contained the causal element(s) or genes with 95% confidence, as in Collins et al. 97 . Finally, we explored the known biological function of all genes within the credible intervals and performed pathway analyses using Enrichr 103,104 (https://maayanlab.cloud/ Enrichr/). All resources used to investigate the knowledge basis of all seizure-associated CNV regions are described in Supplementary  Table 5.

Detailed HPO characterization of Epi25 participants
To identify phenotypic associations with each of the CNVs within a cohort of individuals with epilepsy, we translated clinical data from years 1-3 of the deeply phenotyped Epi25 Collaborative international cohort into Human Phenotype Ontology (HPO, version released 2022-02-14) concepts, following our optimization of the HPO for epilepsy phenotypes 105 . We selected only individuals with CNV data and sufficiently detailed clinical data (as of 2022-01-25) to confirm the presence of seizures or epileptic encephalopathy with continuous spike-andwave in sleep (EE-SWAS, an epilepsy syndrome in which overt clinical seizures may not always be observed). Categorical clinical data were mapped to HPO concepts using a data dictionary. Free text data were annotated with HPO terms manually (D.L.S. under the supervision of I.H. and R.H.T.) 25 . Quantitative data related to the gestational age, weight, and head circumference at birth were categorized to match HPO definitions using sex-stratified distributions from the INTERGROWTH-21th Project using the R growthstandards package (version 0.1.5) 106 .
We inferred all HPO concepts applicable to each individual from those translated from the clinical data by propagation, following the is_a relationships between HPO concepts as previously described 107 , using the R ontologyIndex package (version 2.7) 108 . We excluded HPO terms that carried no information in the context of this cohort (those that were annotated ubiquitously) and modified the relationships of others, tailoring them to this analysis (Supplementary Table 6).
Phenotypes were annotated as being explicitly present or not, without annotating any phenotypes as being explicitly absent. Taking this open-world perspective is conservative, meaning that the proportion of individuals in a group annotated with a particular phenotype should be considered a lower limit while still allowing statistical testing of phenotypic associations and mitigating the risk of explicitly annotating a phenotype as absent when it was present but not recorded or the individual will manifest the phenotype at some point in the future 80 .
After excluding individuals with markers of acquired epilepsy that are unlikely to be part of the phenotype, such as significant brain trauma, encephalitis, or meningitis, 10,880 individuals from the genomic analysis had adequate phenotypic data available for analysis. Of these, 10,106 individuals are of European ancestry, 602 of East Asian ancestry, and 172 of African ancestry, according to PCA analysis. After propagation to infer generic phenotypic descriptors from specific ones, this cohort had 214,203 informative annotations (median = 17 per individual, range = 1-128), spanning a repertoire of 1667 phenotypic concepts. The frequency of annotation of all 1667 phenotypes is available in Supplementary Data 4.

Phenome-wide association analysis of CNVs
All association analyses and phenomic visualizations were performed in R. Associations between CNVs, and HPO concepts were calculated using the Fisher's exact test (function fisher.test from the stats package). The tested phenotypes were all those 1667 HPO terms translated from clinical data that were informative (not ubiquitous) and are detailed in Supplementary Table 3. While this was a descriptive analysis, given a large number of tests performed ((29 groups of multiple individuals + 2 groups of a single individual) × 1667 HPO concepts = 51,677)), we sought to aid identification of the most robust associations. Bonferroni's single step and Holm's step-down adjustments are overly conservative given the dependence structure of propagated HPO annotations. For example, after full harmonization, annotations of Typical absence seizure [HP:0011147], Generalized non-motor (absence) seizure [HP:0002121], and Generalized-onset seizure [HP: 0002197] will be highly correlated because an individual cannot have the first without the second or the second without the last as a result of there is_a relationships in the HPO. Therefore we applied the minP step-down procedure, which uses a permutation-based approach to control the family-wise error rate 61 . We selected 100,000 randomly generated groups of individuals from the Epi25 phenomic analysis cohort of size N, where N is the number of carriers of each CNV. Then for each of these groups, we calculated the two-sided Fisher's exact test P-values for every one of the 1667 HPO concepts. We used the adj_Wstep function from the NRejections package (version 1.2.0) in R to perform the step-down procedure. This generated P-values corrected for the correlation-adjusted number of tested HPO annotations. We did not adjust P-values across CNVs because we were interested only in identifying those associations that were most robust in this descriptive analysis.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
All genome-wide CNV association summary statistics are available at Zenodo (https://zenodo.org/record/7939126#.ZGK7yi-B29Y with https://doi.org/10.5281/zenodo.7939126). Individual-level CNV data for epilepsy patients are available from the Epi25 Consortium (http:// epi-25.org/) upon signing the Epi25 charter (See Epi25 page http://epi-25.org/) and submission and acceptance of a full research proposal. Furthermore, raw data is deposited at dbGAP https://www.ncbi.nlm. nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001551.v1.p1. All HPO-based phenome-wide summary statistics are available in Supplementary Data 3 of this manuscript. Fine-mapping results are available in Supplementary Data 1 and 2 of this manuscript. The CNV data of the Neuropsychiatric cohort are described in the Supplementary Materials of Collins et al. 97 . They can be accessed from existing publications, public resources, or, upon request, from the authors of Collins et al. 97 (see "Key resources table" and Table S2 in Collins et al. 97 ). The CNV data reported by GeneDx and Indiana University clinical testing sites were not consented for public release. All datasets used in this study are detailed in Supplementary Table 1 of our manuscript.